Corpus-analysis for NLG
نویسنده
چکیده
There is a general interest in corpora of human authored texts as a source for acquiring domain knowledge useful for a natural language generation (NLG) system. It is less clear, however, how this can be done in a systematic way. We propose a principled approach towards acquiring domain knowledge through corpus analysis and illustrate its application in the domain of route descriptions. More specifically, we identify different types of knowledge needed in the NLG process and describe a procedure for systematically analyzing a corpus text and for inventorizing these different types of knowledge. We discuss how these procedures fit into a global approach to corpus analysis and into the natural language generation system development cycle.
منابع مشابه
Corpus-Driven Generation of Weather Forecasts
In traditional natural language generation (NLG), careful analysis of a corpus of example texts and determining the single correct sublanguage behind it is seen as one of the main tasks of the NLG system builder. In practice, this often means elimination of variation in the corpus and specification of conditions for rule application to the point where an NLG system becomes (virtually) determini...
متن کاملEfficient algorithm for Context Sensitive Aggregation in Natural Language generation
Aggregation is a sub-task of Natural Language Generation (NLG) that improves the conciseness and readability of the text outputted by NLG systems. Till date, approaches towards the aggregation task have been predominantly manual (manual analysis of domain specific corpus and development of rules). In this paper, a new algorithm for aggregation in NLG is proposed, that learns context sensitive a...
متن کاملCorpus-Based Methods in Natural Language Generation: Friends or Foe? (invited talk)
In computational linguistics, the 1990s were characterized by the rapid rise to prominence of corpus-based methods in natural language understanding (NLU). These methods include statistical and machine-learning and approaches. In natural language generation (NLG), in the mean time, there was little work using statistical and machine learning approaches. Some researchers felt that the kind of am...
متن کاملWhat is in a text and what does it do: Qualitative Evaluations of an NLG system - the BT-Nurse - using content analysis and discourse analysis
Evaluations of NLG systems generally are quantiative, that is, based on corpus comparison statistics and/or results of experiments with people. Outcomes of such evaluations are important in demonstrating whether or not an NLG system is successful, but leave gaps in understanding why this is the case. Alternatively, qualitative evaluations carried out by experts provide knowledge on where a syst...
متن کاملLexical Parameters, Based on Corpus Analysis of English and Swedish Cancer Data, of Relevance for NLG
This paper reports on a corpus-based, contrastive study of the Swedish and English medical language in the cancer sub-domain. It is focused on the examination of a number of linguistic parameters differentiating two types of cancer-related textual material, one intended for medical experts and one for laymen. Language-dependent and language independent characteristics of the textual data betwee...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003